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neat Race Conditions 


“The race is not always to the swift, nor the 
battle to the strong, but that's the way to bet.” 
a Hugh E. Keough © 2020 Philip Koopman 1 
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Race Conditions Mellon 


= Anti-Patterns for Race Conditions: 
e Unprotected access to shared variables 
e Shared variables not declared volatile 
e Not accounting for interrupts and task 
switching in timing analysis 
e Ignoring non-reproducible faults 


= Race condition: multiple threads compete oe 
e Computation outcome depends upon timing Di 


— Usually it is infrequent and hard to debug 
e Concurrent access to shared variable 

— Need to lock shared resources 
e Not accounting for multi-tasking 

— Task switch or interrupt causes delays 

— “Starvation” and priority inversion 
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(1985 - 1987) THERAC 25 
Software-Controlled Radiation Therapy Mishaps 
Problems included: 
- Operators “too fast” on keyboard (8 second window) 
- Bypassed safety checks when counter rolled over to 0 
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Concurrency Management Bugs Pek oe 


= CPU switches among its tasks (multi-tasking) —tasx1 Vail Task 2 


e What if switching happens at the wrong time? ! 








=m Concurrency bugs due to shared resources 


e Example: shared global variable, two tasks ;4 Reap 
—- Task 1 reads shared variable and computes new value | pa 
— Task 2 preempts task 1, updates shared variable | 1 1 SY 
— Task 1 resumes, over-writing task 2's update ! IDS ! 


e Results of concurrency bug depend upon ordering 
— Usually bug won’t manifest (example: 9) : 
— Sometimes bug will result in wrong value (example: 6, 8) 
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Mutex For Concurrency Management tien, 
= Easy solution for concurrency bug: Task aes 


Variable Task 2 





e Disable interrupts when touching shared variable 
— Inhibits task switches 
— But, need to keep it very brief to avoid timing problems 

= Tohold resources longer, use a mutex 

e “Mutual Exclusion’ flag; True=busy / False=available 

e To access shared resource: 
— Get the mutex (wait for it to be false, then set to true) 
- Access shared resource 
—- Other tasks will wait while mutex is locked (resource busy) 
— When done, set mutex to false to release resource 

e Mutexes are themselves a special type of shared variable 
— And therefore subject to race conditions! P27 
- Getting them right is tricky; let the RTOS do this for you 
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Bounded Priority Inversion vet 
m= Minimize time interrupts are disabled 
e Disabled task switching Bounded Priority Inversion 
delays task switching aria —_ 





e Blocking Time: Hic 
high priority tasks 7 


can miss deadlines | 
Low M M 


= Mutexes indirectly 
cause blocking time Behe Peechesering — perbidll 
e Priority Inversion: 
low priority task blocks high priority task 
— Locked mutex prevents high priority task from making progress 
- Only affects tasks that actually use mutex, not all tasks 


— BUT... there is a critical problem (next slide) + 
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Unbounded Priority Inversion RL 


= Priority inversion can be unbounded for three tasks: 
e Medium priority task blocks high task without ever touching mutex: 





Unbounded Priority Inversion 
TASK ———————— 




































































Normal | Mutex Locked Fails To 
Execution (Critical Section) | Get Mutex 
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Priority Inheritance pa Ps 
= Solution to unbounded priority inversion: priority inheritance 
e Task priority elevated when locking mutex; restored when frees mutex 
e This is complicated! Let the RTOS handle it 















































TASK Bounded Priority Inversion 
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Normal Mutex Locked F Fails To 
Execution (Critical Section) Get Mutex © 2020 Philip Koopman 7 








Mars Pathfinder Incident 


a July 4, 1997 — Pathfinder lands on Mars eet 
e First US Mars landing since Vikings in 1976; first rover ig 4 
=m But, a few days later... ) 
e Multiple system resets occur via VxWorks RTOS 
— Watchdog timer saves the day! Sets system to safe state 
— Reproduced on ground; patch uploaded to fix it 
e Scenario pretty much identical to High/Medium/Low priority picture 
— Developers didnt have Priority Inheritance turned on! 


— Why? “The data bus task executes very frequently and is time-critical -- we shouldn't 
spend the extra time in it to perform priority inheritance” [Jones07] 
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https://goo.gl/W5wHrU 
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Best Practices Avoiding Race Conditions eh 


EE 
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is Always consider task interactions = ae 
e What if task switches at a bad time? — 





SS 


as 


e What if tasks read data at different times? P 
e What if half-formed data structure is read? i — y , 
e What if multiple writers compete for data? 

e Use RTOS services to help 

= Pitfalls: 

e Failing to use interrupt masking or mutexes PT Sy 


— Failing to deal with unbounded priority inversion 
— Failing to declared shared variables volatile 
e Assuming that non-reproducible problems aren't bugs 
e Trying to write your own bullet-proof concurrency services 


18-348 Lecture explaining mutex operation at: https://goo.gl/WH9Q44 © 2020 Philip Koopman 9 


